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Recent years have seen vast improvements in the ability of rigorous quantum-mechanical meth- 
ods to treat systems of interest to molecular biology. In this review article, we survey common 
computational methods used to study such large, weakly bound systems, starting from classical 
simulations and reaching to quantum chemistry and density functional theory. We sketch their 
underlying frameworks and investigate their strengths and weaknesses when applied to poten- 
tially large biomolecules. In particular, density functional theory — a framework that can treat 
thousands of atoms on firm theoretical ground — can now accurately describe systems dominated 
by weak van der Waals interactions. This newfound ability has rekindled interest in using this 
tried-and-true approach to investigate biological systems of real importance. In this review, we 
focus on some new methods within density functional theory that allow for accurate inclusion of 
the weak interactions that dominate binding in biological macromolecules. Recent work utilizing 
these methods to study biologically-relevant systems will be highlighted, and a vision for the 
future of density functional theory within molecular biology will be discussed. 
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1. Introduction 

The scientific disciplines (e.g. biology, chemistry, 
physics) once stood well separated from each other, 
with practitioners from each approaching different 
questions in different ways. These divisions are be- 
ginning to blur, however, as answers to questions 
from one field increasingly require techniques and 
knowledge built up in another. There is evidence 
of this effect in the increasing need for interdisci- 
plinary collaborations to solve problems arising in 
distinct fields. A particularly poignant example of 
this blurring of lines is the field of molecular biology, 
where researchers try to build an understanding of 
biological systems starting at the molecular level. 
Concepts from chemistry and physics arise natu- 



rally in such endeavors and this has bred a symbi- 
otic relationship between biologists, chemists, and 
physicists, who now seek to answer similar ques- 
tions. ^21 

Some of the most important questions arising 
in this arena relate to the structure and function of 
biological macromolecules .l^lfSl For example, for ra- 
tional drug design to be viable, a detailed knowl- 
edge of the interactions between a target protein 
and a potential drug molecule is necessary to un- 
derstand whether the drug will bind to the protein 
at the right location and in the right way.l^ Prom 
there, an atomic- level understanding of the protein 
itself is necessary to understand how allosteric ef- 
fects turn a drug binding event into a change in the 
behavior of the protein.'^ These details cannot come 
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from a top down investigation of the molecules, nor 
can they come from simply observing the changing 
behavior as a function of drug binding. Part of the 
physics lies in the statistical mechanics of protein 
conformations, and part resides in the communica- 
tion networks within the protein, the elucidation of 
which hinges on the detailed physics of the binding 
event and the transmission of information from the 
binding site to a possibly distant effector site. 

The same can be said of the "holy grail" of 
molecular biology — understanding protein folding 
and how the structure of a protein relates to its 
function. Coarse-grained models provide some in- 
sight into the process of protein folding,'^ but a 
true understanding of the process and the ability 
to reliably predict how a protein will fold requires 
an atomic-level understanding of the interactions 
within a particular protein. There are many other 
examples of the need for atomistic detail in molecu- 
lar biology. Ultimately, all properties of biological 
macromolecules — such as DNA, RNA, proteins — 
are governed by minute details involving the atomic 
and electronic structure of their constituent parts 
as well as the interactions between neighboring 
pieces of the molecule. Even dynamic conforma- 
tional changes that may be essential to a particular 
process are ultimately governed by these interac- 
tions and similar interactions with the surrounding 
environment. 

Developing an atomic-level understanding of 
large molecular systems is not an easy task and, 
until recently, the application of accurate quantum 
mechanical methods to such systems was infeasi- 
ble. This review highlights recent advances made 
in the fields of computational physics and physical 
chemistry that can aid in building such an under- 
standing. After discussing classical simulations and 
common quantum-chemistry approaches, we focus 
specifically on advances within density functional 
theory (DFT) — a framework used successfully for 
decades in the field of condensed matter physics — 
which affords unprecedented accuracy and utility 
in treating large, weakly bound molecular systems. 
These new methods will be discussed and paired 
with a survey of their use on biologically-relevant 
molecular systems. Possible future applications of 
these methods will also be addressed. 



2. Survey of Common 

Computational Methods 

2.1. Classical simulations 

For many purposes, the best present-day meth- 
ods to study biologically-relevant systems are clas- 
sical force field models. Such methods allow one 
to study the large-scale dynamics of systems with 
perhaps millions of atoms over biologically-relevant 
timescales.l^This is by far the most common compu- 
tational method of study for macromolecules, and 
has provided indispensable insight into numerous 
biological systems. 

The main goal of a force field is simple; to 
represent the energy and forces of a collection of 
atoms using a physically-motivated, yet relatively 
straight-forward, algebraic expression. This simplic- 
ity is what allows the simulation of large systems 
over significant timescales.'^ Generally, the physi- 
cal motivation for terms in the energy Hamiltonian 
come from macroscopic physics. For example, many 
force fields treat bond stretches and angle fiexes as 
classical harmonic oscillators obeying Hooke's law. 
This is, in fact, what is meant by the phrase classi- 
cal force field. 

In its most basic form, a typical force field can 
be written as a sum of separate contributions to the 
total energy,'2E31 [ q 

Eg = -E'bonds + -E^anglcs ~l~ -2'dihedrals ~l~ -^non-bonded 

b 

a 



+ ^^kd[l + cos{n<j)-6)] 

d 




The first term on the right hand side of Eq. ([T]) rep- 
resents an harmonic oscillator (with spring constant 
kh) in bond length between each pair of covalently- 
bonded atoms within the system. The second does 
the same for the three-atom angle term. Dihedral 
angles are treated with a fairly shallow periodic 
potential, represented by the third term. The last 
line of Eq. ([1]) represents non-bonded interactions 
and includes a coulomb term for charge-charge in- 
teractions and a Lennard-Jones (6-12) potential 
to account for van der Waals type interactions. 
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Some force fields add additional terms for out-of- 
plane motions (improper dihedrals) or higher-order 
termsP Variations on the functional form are also 
sometimes applied. For example, a Morse potential 
can be used in place of the harmonic bond term to 
allow for bond breaking during a simulation.'^'^ 

For all their usefulness, so called "Class I" force 
field approaches suffer from some drawbacks. First, 
treating microscopic phenomena using macroscopic 
theory is, in essence, a mean-field approach. The 
quantum-mechanical interactions between electron 
clouds are averaged over. This, along with the as- 
sumed form for all physical interactions, does not 
allow new physics to be uncovered. The only physics 
present in the simulation is what was explicitly in- 
cluded, meaning one cannot gain any true atomic- 
level insight into the underpinnings of interesting 
phenomena. Second, the simplicity of the mean-field 
approach used in force field simulations means that 
they are generally incapable of transferably achiev- 
ing chemical accuracy.'^ While bulk motions and 
general trends can often be gleaned from such sim- 
ulations, the precise movements and behavior of 
atoms are probably not accurate. This poses a sig- 
nificant problem for applications such as drug de- 
sign, where one seeks to find a small molecule (an 
enzyme inhibitor perhaps) that binds with a certain 
affinity to a site in the protein. 

Biophysicists and biochemists have already 
made substantial headway against this problem.'^! 
Originally, force fields included the partial charge 
on an atom as a fitting parameter. The charge was 
assumed fixed during the simulation so effects of 
polarization could not be treated. The next genera- 
tion of force fields incorporates the ability of charge 
to rearrange during a simulation. Such polarizable 
force fields incorporate some of the quantum effects 
necessary to accurately model molecular systems. 
One example of this new type of force field is the 
AMOEBA force field, which includes both static 
and dynamic polarizabilities and represents a signif- 
icant step towards accurate energetics from a force 
field.'I^ In addition, newer force fields often include 
cross-terms that account for how changes in one in- 
ternal coordinate affect other energy terms. These 
help improve accuracy and transferability but can- 
not correct for the lack of an explicit quantum me- 
chanical treatment. 



2.2. Incorporating quantum 
mechanics 

The obvious solution to the shortcomings of the 
classical force field methods is to directly include 
quantum mechanics in calculations. Therefore, the 
solution to the problem is straight-forward; one sim- 
ply has to solve the time-independent Schrodinger 
equation 

H\^) = e\^), (2) 

where the Hamiltonian H in atomic units is given 
by 
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i,j \n-Rj\ 



f^j\Ri-Rj\ 



(3) 



Here, lower-case letters represent electronic degrees- 
of-freedom, upper-case letters represent nuclear 
degrees-of- freedom (including charge Z), e is the 
energy of the system, and the explicit representa- 
tion of the Hamiltonian H follows from specializa- 
tion to an isolated system of atoms under the Born- 
Oppenheimer approximation. 

The unknown function |^) is the wave function 
for the electrons and from it (along with knowl- 
edge of the nuclear positions {Ri}) one can cal- 
culate all accessible properties of the system. Un- 
fortunately, depends on the coordinates of all 
electrons within the system and, result, direct 
solution of Eq. ([3|) for the full many-body wave func- 
tion is difficult or impossible for all but the most 
trivial systems. For example, a single neutral wa- 
ter molecule has 10 electrons, so its wave function 
is a function of 30 variables (i.e. 10 electron posi- 
tions in three dimensions). While an analytical so- 
lution in this simple case is already not possible, it 
is conceivable that the Schrodinger equation could 
be solved numerically. However, to store the wave 
function on a numerical grid consisting of 10 points 
in each dimension (a laughably coarse grid) using 
single precision numerics would take 4 x lO^'' bytes 
(approximately 10^^ TB) of storage. This is "the 
curse of dimensionality" on a grand scale and ren- 
ders full solution of the Schrodinger equation for 
most systems utterly intractable. 

Surmounting this fundamental problem in a 
physical way is not easy and has consumed the ef- 
forts of chemists and physicists alike for decades. 
Prom those efforts, however, have sprung a number 
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of useful approaches. These can be split into two 
categories, wave function theories and density func- 
tional theory, both of which we will discuss in detail 
below. 

All the methods described in what follows 
have exhibited great success in describing various 
quantum-mechanical properties of molecules and 
materials in general. However, when dealing with 
biologically-relevant systems two special consider- 
ations arise: (i) such systems are typically quite 
large and (ii) their structure and binding is often 
dominated by weak van der Waals interactions. 
Since this review is focused on biological applica- 
tions of quantum-mechanical methods, special at- 
tention will be paid to the ability of each method 
to scale well with system size and to adequately de- 
scribe van der Waals interactions. As such, the abil- 
ity of a method to treat large systems involving van 
der Waals interactions will determine its applicabil- 
ity to the biologically-relevant systems considered 
here. 

2.3. Wave function approaches 

A simple solution to the dimensionality problem in- 
troduced by Eq. ([3]) is to seek solutions of the form 

^{{n}) = Mn) Mr2) ■ ■ ■ Mrn) , (4) 

that is, to assume that the total electron wave func- 
tion can be separated and written as a product 
of single-electron states (orbitals) (pi. The Pauli- 
exclusion principle and the anti-symmetry of the 
wave function can be enforced by forming a Slater 
determinant of the single-particle solutions. Since 
the Fock operator used to find the orbitals depends 
explicitly on those orbitals, the resulting equations 
are generally solved self-consistently. This approach 
is a form of mean-field theory where each electron 
responds to the average field created by all other 
electrons residing in their single-particle orbitals. 
The advantage of this approach is that each orbital 
is now a function of the three spatial coordinates, 
making numerical calculations computationally fea- 
sible. Wave function methods are described in detail 
in Ref. M 

The Hartree-Fock (HF) method, which takes 
this approach, is relatively fast and based on sound 
quantum mechanics, but the approximations in- 
voked by its use miss some crucial physics. In par- 
ticular, electrons are dynamic entities. The total 
energy of the system can be lowered if, averaged 
over some degree-of-freedom, the electrons correlate 



their behavior. Correlation is (almost) completely 
missed in the Hartree-Fock method, which explic- 
itly assumes single-particle states — the static corre- 
lation due to the Pauli exclusion principle is fully 
accounted for. Nevertheless, HF theory is a good 
first-order starting point for corrections that incor- 
porate electron correlation into the total wave func- 
tion and its associated energy. Such methods are 
termed post-HF methods, since they use the results 
of a HF calculation as a starting point to incorpo- 
rate electron correlation explicitly. 

There are many post-HF methods that exhibit 
various accuracies coming at related computational 
costs. One of the best features of the wave func- 
tion methods is their segregation into a hierarchy 
of so called "levels of theory". Thus, one knows in 
some sense, to what degree a result can be trusted, 
depending on the precise method used. If better re- 
sults are desired, one merely has to progress to a 
higher level of theory. Basis sets (the set of func- 
tions used to expand the wave functions) are also 
of critical importance. They too, however, exhibit a 
hierarchy of complexity and applicability. Figure [1] 
gives a cartoon depiction of how one can approach 
the numerically exact solution \^) by combining a 
large basis set with a high level of theory. 




Fig. 1. Map of the route from classical physics to quantum 
physics via quantum chemistry. Basis sets are represented on 
the horizontal axis and increase in size as more functions are 
added. The level of theory is indicated on the vertical axis. 
There is a concomitant increase in computational complexity 
as one moves along the path from classical physics to quan- 
tum physics. A plot of this nature is often called a Pople 
diagram. 
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The most rigorous nrethod to include electron 
correlation is full configuration interaction (CI). In 
the CI method, one starts as usual with the orbitals 
found by a Hartree-Fock calculation. Instead of us- 
ing a single Slater determinant of these functions, 
however, a linear combination of Slater determi- 
nants is formed, each one corresponding to one pos- 
sible ordering of electrons in the orbitals. In other 
words, all possible combinations of electron excita- 
tions are given a Slater determinant, and the opti- 
mized linear combination of these yields the numer- 
ically exact wave function. This renders Full CI a 
combinatorial problem — taking a given number of 
electrons and producing all possible excitations to 
a given set of orbitals. Thus, full CI scales facto- 
rially with the number of basis functions used and 
therefore is not practical in all but the smallest of 
systems. 

Perhaps the next best thing to a full CI calcu- 
lation is to use coupled cluster theory. The coupled 
cluster approach mimics CI but using only small 
numbers of electron excitations, usually considering 
only excitations of one to three electrons. The most 
common variant of coupled-cluster theory is notated 
CCSD(T), which includes single and double exci- 
tations iteratively, and triple excitations perturba- 
tively. This has proven incredibly reliable and rep- 
resents the "gold standard" for accurate quantum- 
chemistry calculations. Although it has polynomial 
(rather than factorial) scaling in the number of ba- 
sis functions used, the asymptotic scaling of 0{N'^) 
for the generally used form renders this approach 
mainly useful on relatively small systems of perhaps 
30-50 atoms. 

Among the most used post-HF methods is 
M0ller-Plesset perturbation theory at second order 
(MP2) or higher order (MPS, MP4, . . . ). In pertur- 
bation theory, one seeks to find the solution of 



Ho + XH' 



E\ 



(5) 



where the perturbation strength factor A is assumed 
small, and the solution to the unpertured problem 
(A = 0) is already known. In this case, H is the 
non- interacting Hartree-Fock Hamiltonian and H' , 
which is assumed to be small in effect relative to 
Hq, is the Hamiltonian for inclusion of electron cor- 
relation. MP2 expands this expression in terms of 
powers of A up to second order. This can be used 
to correct both energies and wave functions. 

MP2 has shown great success, but it is not per- 
fect. Comparison with coupled cluster and full CI 



methods have shown that MP2 often significantly 
overestimates the correlation, especially in delocal- 
ized TT systems .'^'^ Usage of the higher-order ex- 
pansions (e.g. MP4) may yield increased accuracy, 
but the results are not as straightforward as one 
might hope, as convergence of the M0ller-Plesset 
series has been shown to be unreliable.'^ In many 
cases, estimates of correlation may get worse with 
increasing order; sometimes oscillating or even di- 
verging in the worst cases .'^^l Convergence depends 
on both the system under study and the basis set 
being employed, with poor results often accom- 
panying use of the diffuse functions required to 
correctly model dispersion interactions. Neverthe- 
less, MP methods are highly prized in quantum- 
chemistry wave function calculations because they 
contain a good balance of accuracy and computa- 
tional efficiency. The asymptotic scaling of MP2 (as 
0{N^)) makes it substantially cheaper than high- 
level coupled cluster methods. MP2 can be used on 
systems of respectable size. A system with a hun- 
dred atoms or more is not out of the reach of an 
MP2 calculation on a high-end computer. 
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Fig. 2. Maximal system size (measured by number of atoms) 
that various quantum mechanical methods can treat, as a 
function of time. "Exact treatment" refers to an exact solu- 
tion to the Schrodinger equation and QMC stands for Quan- 
tum Monte Carlo (not discussed here). All other methods 
are discussed throughout the text. (Reprinted with permis- 
sion from Ref. [2T]; © 2008 American Physical Society). 
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2.4. Density functional theory 

Wave Function theories have a number of nice prop- 
erties, but they scale poorly with systems size. A 
completely different approach, density functional 
theory (DFT), scales as 0{N'^), and is therefore 
much more amenable to calculation of large sys- 
tems. Calculations can be performed on systems 
consisting of perhaps several thousand atoms, mak- 
ing it applicable to biochemical systems. 

In 1964 Hohenberg and KohrPSI published 
a seminal paper showing that the quantum- 
mechanical energy of a set of atoms can be writ- 
ten uniquely functional of the electron charge 
density (within the Born-Oppenheimer approxima- 
tion). Furthermore, the charge density no(f) that 
minimizes this functional is the ground-state charge 
density for the system, and all measurable proper- 
ties of the system can be written in terms of this 
optimal charge density. This avoids the dimension- 
ality problem of Eq. ([3]) by shifting the quantity of 
interest to the charge density in real space, a func- 
tion of only three variables regardless of the number 
of electrons. 

Density functional theory as a modern ap- 
proach was initiated when Kohn and SharrPSI wrote 
the energy as a density functional of the form 

^DFT [n(f)] = E^[n{r)] + ^n-c [n(f)] + E^.^[n{r)] 
+ E^cHr)] + En-N , (6) 

where Ey^ is the total kinetic energy of the sys- 
tem, in principle written as a density functional, 
but in practice written as a functional of the Kohn- 
Sham orbitals. An analytical density functional for 
E\^ is not known, but approximations to it lead 
to so called orbital-free methods. The final term 
in Eq. ([6]) is the nucleus-nucleus repulsion term, 
which can be treated simple additive con- 

stant since it is uniquely determined by the posi- 
tions of the nuclei and these are decoupled from 
the quantum-mechanical problem by use of the 
Born-Oppenheimer approximation. Analytical ex- 
pressions are known for both -En-c (the nucleus- 
electron, effective 1-body term) and E^.^ (the 
Hartree term giving the average electron-electron 
interaction), leaving the exchange- correlation func- 
tional Excin^f)] as the sole unknown object in 
Eq. dSD. 

If the exchange-correlation functional and its 
functional derivative with respect to the density 
were known, they could be used to optimize the to- 
tal energy functional (£'dft["'(^)]) with respect to 



the density, thereby finding the ground-state den- 
sity. Since, E^c is not known however, it must be 
approximated in some way. This is the main ap- 
proximation in DFT and determines the method's 
applicability to a particular system. Not surpris- 
ingly then, much effort is put into improving the 
approximations made in generating £'xc[?T.(r )]. 

One approach is to assume that the exchange- 
correlation energy is a local functional of the 
density — one that depends on n(r) in a point- 
wise fashion. f^f^ This local density approximation 
(LDA) is good when the density is slowly varying, 
becoming exact in the limit of a uniform electron 
density. 

Despite its simplicity, the LDA is amazingly 
good in many systems, especially in those with rel- 
atively concentrated charge density such as crys- 
talline environments where metallic or covalent 
bonding dominates. For molecules, where direc- 
tional covalent bonds are the primary interaction, 
it tends to perform less adequately. One can imag- 
ine the LDA as the zeroth-order term in the Taylor 
expansion of the density about each point, and envi- 
sion adding additional, derivative-dependent terms. 
A functional depending on the density and its gradi- 
ent (first derivative) in a point-wise fashion is called 
a semi-local functional, and the approximation of 
the true energy functional in this way is called the 
generalized gradient approximation (GGA).'2S1 This 
approximation is a substantial improvement over 
LDA in many systems, particularly molecules. 

The E^c term in Eq. ^ must approximate 
the effects of both exchange, which removes the 
unphysical electron self interaction while enforcing 
the Pauli exclusion principle, and correlation, which 
roughly speaking, accounts for the fact that each 
electron experiences a highly dynamic environment 
rather than a mean field of the other electrons. In 
Hartree-Fock theory, the form of the exchange op- 
erator is known, so exchange could be treated ex- 
actly and combined with an approximate correla- 
tion functional. Unfortunately, most functionals ex- 
hibit serendipitous error cancellations between their 
exchange and correlation pieces, making just using 
the correlation contribution prone to large errors. 
In 1993 Becke proposed using a 50%-50% mix of 
exact exchange and LDA,'^ eventually leading to 
the 3-parameter B3LYP functionaP^ and similar 
hybrid functionals, which are among the most ac- 
curate functionals for covalently-bound molecules. 
Early successes of hybrid functionals led some to 
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erroneously believe that they could describe weak 
van der Waals interactions. Unfortunately, hybrid 
functionals, being a linear combination of exact ex- 
change and (semi)-local exchange-correlation ap- 
proximations, cannot account for van der Waals 
interactions. This is because van der Waals inter- 
actions are a non-local correlation effect, and any 
functional that is local or semi-local in correlation 
is — by construction — not able to reliably describe 
them.f^Sl There is ample discussion in the literature 
of the poor performance of standard hybrid func- 
tionals in weakly bound complexes. 1^^^^ 

2.5. van der Waals interactions in 
DFT 



As evident from the discussion in Section 12.41 and 
Fig. [21 DFT is capable of treating large systems of 
perhaps thousands of atoms, which is one of the re- 
quirements if it is to be applicable to systems of 
interest in molecular biology. However, at the same 
time, it also has to be able to accurately describe 
weak van der Waals interactions, which play an im- 
portant role in biomolecules. Historically, DFT has 
not performed well when applied to systems with 
van der Waals interactions — this is probably the 
single most important problem that has prevented 
DFT from gaining a strong foothold in molecular 
biology. Below we discuss the shortcomings of stan- 
dard DFT and several recent developments that 
overcome this barrier, leading to a full applicabil- 
ity of DFT to large biomolecules. 

In standard DFT, the exchange-correlation 
functional is often assumed to be local, i.e. a single 
spacial integral of the exchange-correlation energy 
density, which depends explicitly on the charge- 
density. This approach leads to the so called lo- 
cal density approximation (LDA). Adding a de- 
pendence on the gradient of the charge density 
results in the generalized gradient approximation 
(GGA), while inclusion of higher-order derivative 
terms yield meta-GGA functionals. However, this 
approach fails to correctly account for van der 
Waals (vdW) interactions, which are non-local cor- 
relation effects; they occur between physically sep- 
arated regions of charge, generally with little over- 
lap of their density functions. Capturing these ef- 
fects correctly requires a functional that expresses 
the exchange-correlation energy as (at minimum) a 
double spacial integral, van der Waals interactions, 
ubiquitous in polyatomic systems, occur when elec- 
tron motions in one atom (or within one molecule) 



correlate with electron motions in a nearby atom (or 
molecule) setting up transient but interacting mul- 
tipoles within each.l^SI Correlation between electrons 
lowers their energy relative to uncorrelated elec- 
trons, so the van der Waals force is always attrac- 
tive. In some systems, crystalline NaCl for example, 
the contribution of these interactions to the overall 
binding are negligible. In other systems these in- 
teractions can be an appreciable part of the overall 
interaction. Nobel gas dimers such as Ar2 and Kr2 
are held together entirely by van der Waals inter- 
actions. Large diffuse molecular systems (prime ex- 
amples being biological macromolecules) rely quite 
heavily on van der Waals interactions for their sta- 
bility, so such interactions play an integral role in 
their behavior. 

With this in mind, numerous attempts were 
made to include the ability to capture van der Waals 
interactions within conventional DFT. A thorough 
account of all these efforts is beyond the scope of the 
present review, but several promising approaches 
will be discussed. 

2.6. DFT-D 

As stated earlier, van der Waals interactions arise 
when electronic motions within separated atoms 
correlate, setting up transient multipole moments 
within the individual atoms. One can expand 
the dispersion energy of two arbitrary, polarizable 
charge densities in terms of the interactions of in- 
duced multipoles. If a point of interest is located at 
a distance r that is large compared to some charac- 
teristic length scale of the charge distribution, the 
pairwise dispersion energy can be expanded in pow- 
ers of 1/r ad^ 



(7) 



where the constants Cj correspond to a particular 
system and determine the relative strengths of the 
various terms. 

For sufficiently large distances r, the dipole- 
dipole term dominates and dispersion interactions 
go as This observation is the basis of the den- 
sity functional theory with added dispersion (DFT- 
D) method.HSl'SZlEl] Typically, this method works by 
adding to the total energy a pairwise atomic correc- 
tion of the form 



EvdW — -X ^ /damp (Rlj) 



IJ 



/?6 



(8) 
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where -Evdw is the dispersion energy, C/j is an 
empirically-derived coefficient that is atom-pair- 
dependent, /damp (-R/j) is a damping fmiction, and 
the sum runs over all pairs of atoms. The damping 
function ranges from at small Rij to 1 for larger 
separations, and is required because the asymptotic 
form becomes unphysical as distances become 
small. The specific form of the damping function 
plays a role in the accuracy of the technique.^Sni 
Too weak a damping with decreasing distance re- 
sults in over-counting of the interaction energy. Too 
strong a damping will weaken the vdW interactions 
at relevant ranges. The most critical aspect of the 
damping function is how it behaves at intermediate 
distances near the bonding length of a vdW bond. 
Much attention has been paid to the form of the 
damping function, and opinions differ on its optimal 
form. Commonly, the damping function is given the 



/damp 



(9) 



where a is a chosen constant and i?o sets the rele- 
vant distance scale for the interaction of atoms I 
and J and is generally chosen to be the sum of 
their van der Waals radii. This was the form cho- 
sen by Grimme in 2004,12] when he published a set 
of Ce coefficients based on a database of dipole 
oscillator strength distributions, for a number of 
important atoms and demonstrated the method's 
effectiveness on a large set of molecular systems. 
The values of the coefficients (and correspond- 
ing vdW-radii) depend somewhat on the choice of 
exchange-correlation functional, so Grimme added 
an empirical parameter he called sg that scales the 
interaction, adjusting its strength to the functional 
being used. Approaches like the DFT-D method are 
not new, dating back at least as far as London him- 
self,'^ but they have proved extremely useful at 
many levels of atomic theory, and continue to be 
so within DFT. 

2.7. DFT+vdW 

The pairwise dispersion correction given by Eq. ([8]) 
is not a density-functional, but instead relies on fit- 
ting to a chosen set of external data. The data used 
in the fit and the interaction between the disper- 
sion correction and the functional coupled with it 
both affect the results obtained. Such a fitting pro- 
cedure can limit transferability between systems. 
The original DFT-D approach of Grimme has been 



re-parameterized many times both for improved ac- 
curacy and for application to other systems.'^'^ZlSSl 
To improve on the transferability and overall 
accuracy of DFT-D, Tkatchenko and Scheffler pro- 
posed an alteration, which uses a relative Cq coef- 
ficient calculated on-the-fly from the charge den- 
sity.S^l In this approach (hereafter referred to as 
DFT+vdW) they define the effective volume for an 
atom A within a system, relative to the free-atom 
volume as: 



T/free 



if A {"t) n(r ) d f 



'3^ 



(10) 



where /a(^) is the fraction of the density at r aris- 
ing from atom A in a linear combination of free- 
atom charge densities. 

Starting from the free- atom coefficients Cq'^'^, 
they define the effective coefficient for an atom 
within a molecule or solid as 
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with the hetero-nuclear combination rule for atoms 
A and B defined as: 



AB 



r,riAAr^BB 
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(12) 



where is the static polar izability of atom i. Thus, 
in the DFT+vdW approach one may write 



/?6 



(13) 



with / and J ranging over all atoms. That is, the 
dispersion energy can be written as a functional (al- 
beit a non-universal one that depends on the ar- 
rangement of nuclei) of the charge density. Writ- 
ing the Cq coefficients as density functionals allows 
for the polar izability of atoms to be a dynamic, 
environment-dependent quantity. If it could be cal- 
culated, the functional derivative of this expression 
with respect to the charge density would yield the 
Kohn-Sham potential for the dispersion energy, al- 
lowing the latter to be calculated self-consistently. 
It is not clear at present whether this would sig- 
nificantly affect the interaction energies of vdW 
compounds when using the DFT+vdW method. 
Tkatchenko and Scheffler do note, however, that the 
use of Eq. (flOl) largely cancels the charge density dif- 
ferences arising from the use of different functionals. 
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making their method less sensitive to the particu- 
lar exchange-correlation functional used compared 
with static Cq/t^ approaches. 321 

In 2008, Tkatchenko and von Lilienfeld noted 
that many body effects can play a significant role 
in the energetics of vdW-rich systems.'^ This is es- 
pecially true in bulk, where close-packed atoms can 
be geometrically arranged in many complex ways. 
In particular, three-body, triple-dipole interactions 
can contribute substantially to binding energies, 
typically raising them relative to pure pairwise in- 
teractions. Given the recent surge of inquiry into 
metal organic frameworks and other molecular crys- 
tals,!^ the ability to account for this fact may be- 
come of increasing interest. In 2010, an expanded 
version of DFT-|-vdW was described, wherein an 
additional three-body term was added.lSSlThe three- 
body term was given the form 

p3-body _ „ABC ^(^O^i^AB) COs{<pBc) COsjcpAc) + 1 
-^vdW ~ '-'9 7?3 p3 r3 ■ 

-"-AB^BC^AC 

(14) 

where Rjj is the distance between atoms / and J 
and is the angle between Rki and Rrj- The 
formulation of this expression into a density func- 
tional then follows in a fashion similar to that of 
the Cq term, the fundamental difference being the 
inclusion of an angular dependence in the damping 
function. 

2.8. vdW-DF 

An alternative approach to the addition of pair- 
wise atomic dispersion terms is to express the to- 
tal energy of a system directly as a non-local func- 
tional of the density. That is, to write the exchange- 
correlation functional in such a way that it depends 
simultaneously on the charge density at multiple 
points. In principle, this is the optimal approach 
because the true exchange-correlation functional is 
fundamentally a non-local functional. Treating it on 
such a footing allows for its integration into DFT 
in a seamless and self-consistent manner. 

In the van der Waals density functional (vdW- 
DF) approach, the exchange-correlation functional 
E^c takes the form 

= + j n{n) (Pin , fa) n(f2) dVidVa , 

where E]f,^^' is a local-like piece of the functional 
that is assumed to be well modeled by standard 



functionals and £;non-iocai jg ^ non-local piece, which 
is evaluated by considering all pairwise points in the 
charge density. The kernel function <f) describes how 
charge densities at fj and fj correlate. 

A meaningful form for (j) was described by Dion 
et al. in 2004, leading to the van der Waals density 
functional (vdW-DF).H3 This functional evolved 
from a less general one restricted to planar geome- 
tries.l^ The analytical form of (/) its numeri- 
cal computation are onerous, but since (p itself does 
not depend on the density, it can be calculated and 
tabulated once-and-for-all. The functional deriva- 
tive of Eq. (fT5]) with respect to the charge density 
was given in 2007,^^ allowing for completely self- 
consistent calculation of energies and forces using 
this method. 

The functional as originally proposed required 
the evaluation of a double integral over three- 
dimensional space, as one might expect from a non- 
local functional. This made the use of the func- 
tional costly relative to other local or semi-local 
options. However, in 2009 Roman-Perez and Soler 
effected a great simplification, by transforming the 
double integral into a single integral over Fourier 
transforms using the convolution theorem.^SS Since 
Fourier transforms are efficiently obtained and/or 
readily available in plane- wave DFT codes, this 
dropped dramatically the time required to evaluate 
the vdW-DF functional and made the cost of its use 
on par with that of a similar GGA calculation. 

It was quickly noted that, when used on vdW- 
rich systems, the functional produced binding dis- 
tances that were slightly larger compared with ex- 
periment or high-level calculations This led to the 
assertion that the revised Perdew-Burke-Ernzerhoff 
exchange functional,!^ originally chosen to accom- 
pany the vdW-DF because it exhibited minimal 
spurious binding of its own, was too repulsive.'^ 
Lee et al. revised the approach in 2010, recommend- 
ing the use of a less repulsive revised version of the 
Perdew-Wan^SSlET] exchange functional and chang- 
ing the value of a gradient coefficient.'^ These small 
changes improved the method's accuracy for both 
energy and geometry in many systems. For an in- 
depth review of this approach see Ref. EH 

2.9. Other methods 

There are a number of other approaches that are 
capable of describing van der Waals interactions 
within a DFT framework. A full listing is beyond 
the scope of this review, but several of the more 
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common approaches are briefly discussed here. 

In symmetry adapted perturbation theory 
(SAPT), the interaction energy of a system is 
written as a perturbative expansion in terms 
of physically-meaningful interactions. The Hamil- 
tonian for a superposition of non-interacting 
monomers is taken as Hq, with the interaction 
between monomers forming the perturbing poten- 
tial.'^ Terms in the perturbation generally include 
electrostatic, exchange, induction, and dispersion 
interactions.'^'^ The principle advantage of this 
approach is that the relative contributions from dif- 
ferent physical interactions can be determined ex- 
plicitly. This leads to an intuitive interpretation of 
the interaction energy. The downside to the ap- 
proach is its computational cost since the method 
scales as 0{N^) (when taken to second order) with 
increasing system size.'^ It is therefore limited to 
relatively small systems. See Ref. EU| for an excel- 
lent overview of SAPT and its applications. 

Zhao and Truhlar have developed a series of 
functionals designed to obtain accurate energies for 
weakly-bound systems.'^'^ These functionals have 
been shown to work well for the vr-stacking and hy- 
drogen bonding interactions that are omnipresent 
in biological macr omolecules [30][65l |75H76] fpj-^g advan- 
tage of these functionals is their efficiency, being es- 
sentially the same computational cost as a typical 
DFT calculation. There are several families of these 
functionals, designed and parameterized to apply 
to different chemical situations. The functionals are 
known to poorly describe dispersion interactions in 
the asymptotic limit, where they decay exponen- 
tially rather than as 1/r^.'^ Nevertheless, they have 
seen heavy use recently for their ability to accu- 
rately and efficiently capture the short-ranged con- 
tributions to dispersion interactions. 

In the dispersion-corrected atom-centered po- 
tential (DCACP) approach of von Lilienfeld et al.,'^ 
van der Waals interactions are handled by means of 
an effective electron-core interaction. Typical plane- 
wave density functional theory approaches utilize 
pseudopotentials, which treat nuclei and core elec- 
trons together as an effective, angular momentum- 
dependent potential. The potential is designed such 
that the all-electron wave-function is reproduced 
faithfully. In the DCACP approach, the non-local 
piece of this effective core potential is optimized 
to reproduce high-level calculations of molecular 
properties, specifically, the dispersion energies and 
forces within molecules. Since the DCACP method 



uses the same type of effective core potential that 
is traditionally used in plane-wave calculations, its 
use does not impose additional computational com- 
plexity. The effective potential is designed as a van 
der Waals correction to standard gradient-corrected 
exchange-correlation functionals, so potentials must 
be optimized for each type of atom and for every 
exchange-correlation functional that the method is 
to be paired with. Optimized effective potentials 
have been generated for all the standard biolog- 
ical atoms (carbon,'^'^ nitrogen,'^ oxygen,'^ hy- 
drogen,'^ sulfur,'^ and phosphorud^, each with 
several gradient-corrected functionals. The method 
shows good transferability and has been used in 
moleculai'^ as well as solid-state applications.'^'^ 
Although van der Waals interactions are gen- 
erally thought of as a correlation effect, the ap- 
proach of Becke and Johnson takes a wholly differ- 
ent viewpoint, treating them instead as arising from 
interactions between an electron-exchange hole pair 
in one system and an induced dipole in another.'^ 
This viewpoint is motivated by the fact that the 
exchange hole is, in general, not spherically sym- 
metric, so the electron-exchange hole system has a 
non-zero dipole moment. This dipole moment does 
not affect the energy of the system containing the 
electron-exchange hole pair since only the spherical 
average of the exchange hole enters the energy ex- 
pression for a system. This electron-hole dipole can 
correlate with a separate system, however, yielding 
a dispersion-like interaction. When averaged over 
the entirety of a system, the approach yields molec- 
ular Cg coefficients in good agreement with those 
from high-level methods.'^ These can be decom- 
posed into atomic Cg coefficients and used in a 
scheme similar to that in the DFT-D approach.'^ 
A required component of the approach is the dipole 
moment of the exchange hole, which Becke and 
Johnson conveniently cast as a meta-GGA func- 
tional by utilizing the approximate Becke-Roussel 
form for the exchange hole.'^ Further development 
led to the ability to calculate Cg and Cio coeffi- 
cients.'^ This approach is simple, elegant, and per- 
forms well over a variety of systems. 

3. Applications to Biochemistry and 
Molecular Biology 

As can be seen from Fig. [21 of all the quantum 
mechanical methods discussed above, only DFT 
is currently capable of treating systems consisting 
of several hundred to several thousand atoms — i.e. 
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the lower end of the range of biologically relevant 
molecules. As such, in this application section we 
will almost exclusively focus on studies that have 
used DFT to investigate such systems. 

The methods outlined above represent current 
state-of-the-art DFT as it applies to vdW-rich sys- 
tems. In what follows, these methods' ability to do 
useful biochemistry will be highlighted through a 
brief survey of recent studies conducted both to 
test them and to learn from them. This survey is 
intended to act as a showcase of the capabilities of 
modern DFT, rather than a comparison of particu- 
lar methods of its implementation. 

3.1. Small molecules 

Small molecules make a natural proving ground for 
new methods in DFT because calculations can be 
compared with quantum chemistry methods. There 
exists an extensive body of work, much of it carried 
out by Sponer and HobzgP^l'^ and, independently, 
by Stefan Grimme,'^ benchmarking the DFT meth- 
ods discussed above against accurate wave func- 
tion approaches with special focus being placed on 
biologically-relevant molecular systems. This work 
has yielded encouraging results and forms the foun- 
dation upon which studies of the physics in these 
systems rests. But studying small molecules is use- 
ful in its own right, since these play a pivotal role in 
biochemistry. Most notable among the biologically- 
relevant small molecules are water and the building 
blocks of macromolecules themselves, namely, DNA 
bases and amino acids. 



3.2. Water 

Water has received special attention in the lit- 
erature, both because of its great importance 
to (bio) chemistry and because an accurate first- 
principles understanding of it has proven surpris- 
ingly elusive. Most molecular interactions within 
living systems occur in an aqueous environment, so 
an understanding of water is a necessary precursor 
to developing an understanding of in vivo biochem- 
istry. 

These days, the bulk behavior of water (e.g. 
phase diagram, radial distribution functions) is 
well modeled by parameterized force fields) '^^"^^ ^ Al- 
though these force fields get many of the properties 
of water correct compared with experiment, the fact 
that they were parameterized to do just that lim- 
its their usefulness as a tool for understanding the 



atomistic interactions in water. At the fundamen- 
tal level there are quantum effects, most notably 
the quantum-mechanical nature of the hydrogen nu- 
clei,'^'^that cannot be easily reproduced with clas- 
sical models. This clouds the connection between 
microscopic effects and bulk behavior. A full un- 
derstanding of the behavior of water can only come 
from a quantum mechanical description that applies 
at the microscopic level, but can be extended up to 
the macroscopic limit. 

The behavior of small water clusters (H20).„ 
with n less than about 6 has been extensively stud- 
ied at the quantum level and is largely under- 
stood.'^^'^ Minimum energy geometries can be cal- 
culated with high level wave function methods and 
these have been compared with various DFT treat- 
ments. At this level, standard DFT does a reason- 
able job at describing the geometric and energetic 
properties of water, but some improvement can be 
made by including dispersion interactions .'S^^^ Al- 
though the hydrogen bonds that govern water's 
structure are not typically thought of as a van der 
Waals effect, recent studies have shown that geome- 
tries, energies, dipole moments, and vibrational fre- 
quencies of small water clusters are all improved by 
inclusion of van der Waals interactions,'^^^'^ as can 
be seen in Figs. [3] and |4l 
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Fig. 3. Systematic improvement in the description of wa- 
ter properties with the inclusion of van der Waals interac- 
tions. Calculations on small water clusters including van der 
Waals interactions (vdW-DF) compared with standard lo- 
cal (LDA) and gradient-corrected (PBE) functionals. Shown 
are the binding energies, equilibrium geometries, and dipole 
moments for each cluster. (Reprinted with permission from 
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Ref. [M]; © 2011 American Physical Society). 

The improved description of water when van 
der Waals effects are included is not hmited to small 
water clusters, but continues into the bulk.'^'^SlEllSZI 
Through a series of ah initio molecular dynamics 
simulations Lin et a\^^ showed that the radial dis- 
tribution functions produced by standard gradient- 
corrected functionals tend to produce water that 
is over-structured compared with experiment. This 
was also evident in the average number of hydro- 
gen bonds and the self-diffusion coefficient, both 
of which show an over-structuring of the water 
molecules. These results mirror obtained by numer- 
ous other groups working with a variety of different 
codes, exchange-correlation functionals, and basis 
gg^^98H102] xhis over-structuring is mitigated to a 
large degree by a proper treatment of van der Waals 
interactions. The self diffusion coefficient increases 
three-fold and the over-structuring evident in the 
radial distribution functions softens when van der 
Waals interactions are included. This is also true for 
bulk ice in its standard hexagonal form (I/j) where 
inclusion of van der Waals interactions again im- 
proves the description of structural and electronic 
properties. 
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Fig. 4. Decomposition of the oxygen-oxygen radial distribu- 
tion function as calculated by a standard gradient-corrected 
functional (PBE) and vdW-DF (here called DRSLL-PBE). 
The interactions are broken into (from top to bottom) first 
coordination shell hydrogen bonds, second coordination shell 
hydrogen bonds, and higher-order interactions. (Reprinted 
with permission from Ref. [96]; © 2011 American Institute 
of Physics) . 



It is worth pointing out that the results of 
DFT calculations can vary quite widely depending 
on the choice of basis set and exchange-correlation 
functional used, and great care must be taken 
with their selection. For example, when coupled 
to the non-local piece of the vdW-DF, the overly 
repulsive revised PBE exchange functional actu- 
ally produces water that is under- structured com- 
pared with experiment, in contrast to most other 
exchange functionals. This is related to the afore- 
mentioned tendency of the original vdW-DF to 
predict intermolecular interaction distances that 
are large compared with experiment and high-level 
wave-function methods. Additionally, it has been 
pointed out that the properties of liquid water cal- 
culated within DFT ca n d epend quite strongly on 
the choice of basis set.'^^ Calculations similar to 
those shown in Fig. H] were carried out by Zhang 
et al. and showed considerably less improvement in 
the oxygen-oxygen radial distribution function com- 
pared to experiment The basis sets used in the 
two sets of calculations were fundamentally differ- 
ent, making a direct comparison of their appropri- 
ateness difficult. Despite these issues, it is gener- 
ally agreed that, when properly chosen basis sets 
and exchange-correlation functionals are used, the 
inclusion of van der Waals interactions fundamen- 
tally improves the DFT description of water, both 
at the microscopic level and in bulk. 

It is interesting to note that, although inclu- 
sion of van der Waals interactions greatly improves 
the description of water, this alone does not com- 
plete the picture of important effects within wa- 
ter. The standard Born-Oppenheimer approxima- 
tion used in quantum-mechanical studies treats all 
nuclei as classical point particles. Recent work by 
a number of groups, however, has shown that nu- 
clear quantum effects may play a signif icant r ole 
in determining the properties of water .EIlllMHlQSJ jj-^ 
fact, it has been shown that such nuclear quantum 
effects may be more far-reaching, playing a sub- 
stantial role in hydrogen bonds in general, not just 
between water molecules.'^ This would, of course, 
have enormous consequences for a proper descrip- 
tion of interactions within biological molecules such 
as proteins and DNA, where hydrogen bonds of- 
ten dominate the binding. For example, it has been 
proposecP^ that the keto form of DNA nucleobases 
(the standard form required for Watson-Crick hy- 
drogen bonding) can spontaneously tautomerize via 
hydrogen tunneling to the enol form, a process 
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which could be responsible for some types of DNA 
damage. A recent study by Perez et al. found that, 
although such tunneling does occur, the metastable 
enol form has a lifetime too short to play a sig- 
nificant role in DNA mismatch damage .'^^ In fact, 
the effects of quantum nuclei appear to dynamically 
stabilize the keto form. For a recent review of these 
considerations see Ref. 11091 

3.3. DNA nucleobases 

The four nucleobases, arranged in different se- 
quences along strands of a sugar-phosphate poly- 
mer, have enabled the information of life to be 
stored and propagated since life began. Each of 
these relatively simple molecules contains an aro- 
matic ring capable of engaging in multiple hydro- 
gen bonds. When these bases come together in 
a Watson-Crick, edge-on manner, they can form 
hydrogen bonds strong enough to hold two DNA 
strands together. When brought together in a par- 
allel face-on fashion, they form tt-tt stacking inter- 
actions strong enough to give it an average persis- 
tence length of roughly 50 nm with some sequences 
having even larger persistence lengths.^^ 

Cooper, Thonhauser, and Langreth calculated 
the base interaction energy as a function of distance 
for a Watson-Crick, edge-on approach of two base 
pairs (see Fig. [5|).'^^ This was done for the A:T, 
A:U, and G:C combinations. The G:C base pair ex- 
hibits a maximum interaction energy of about twice 
that of the other pairs, not surprising since it has 
an extra hydrogen bond, and all three show similar 
equilibrium binding distances. 

The base stacking energy as a function of ge- 
ometry has been studied by several groups .1^^^ The 
binding energies as a function of twist angle for all 
possible stacked base pairs are shown in Fig. [6l It is 
noted that the methyl substitution that differenti- 
ates thymine from uracil stabilizes the systems with 
respect to twist. 

In 2006, Jurecka et al. published a set of ac- 
curate, quantum-chemical binding energies of 22 
molecular dimers, selected for the importance of 
van der Waals interactions within them.'^ The set 
(dubbed the S22 dataset) was broken into three dis- 
tinct groups: (i) dimers for which hydrogen bond- 
ing is the key component of binding, (ii) dimers for 
which pure dispersion is the key component of bind- 
ing, and (iii) dimers which exhibit a mixture of both 
of these effects. Comparison with this dataset be- 
came the de facto metric for assessing the ability 



of fledgling methods within DFT to correctly ac- 
count for van der Waals interactions. Within this 
set (which was later revised, exp and ed, and placed 
in a convenient online database j^Hl were a homo- 
dimer of uracil and an A:T heterodimer. 




Fig. 7. Ratio of 3-body dispersion term to total CCSD(T) 
binding energy for the 22 systems in the S22 data set (data 
taken from Ref. l36p . Note that the hydrogen-bond dominated 
systems show httle dependence on 3-body dispersion while 
the dispersion systems show variable but significant 3-body 
contributions. Specifically marked are the uracil-uracil U:U 
and adenine-thymine A:T dimers, which exhibit the highest 
3-body dispersion fraction of all the systems. The gray boxes 
denote the average 3-body dispersion fraction within each 
group. 

In 2010, a landmark paper by von Lilienfeld and 
Tkatchenko showed that the uracil-uracil U:U and 
adenine-thymine A:T stacked bases exhibit large 3- 
body dispersion terms.l^ Going a step further, the 
authors addressed the magnitude of two and three- 
body dispersion interactions across the entire S22 
dataset. Some of their results are shown in Fig. [71 
Using the DFT-|-vdW approach enhanced with the 
triple-dipole term (as discussed in Section [2]), they 
found that the three distinct groups of the S22 
set show markedly different dependencies on three- 
body dispersion interactions. The systems showing 
large 3-body dispersion terms (which include the 
stacked U:U and A:T dimers) were the systems 
deemed dispersion-dominant and those with essen- 
tially no 3-body dispersion interactions were sys- 
tems dominated by hydrogen bonding. The authors 
argue that 3-body effects may be more important 
than previously thought. Interestingly, for stacked 
nucleobases the 3-body dispersion term seems to 
be relatively constant, especially compared with the 
pairwise dispersion term. Figure [8] shows the 2 and 
3 body dispersion terms calculated by von Lilien- 
feld and Tkatchenko for 42 stacked nucleobases and 
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N1(A) - N3(T) distance [A] N1(A) - N3(U) distance [A] N1(G) - N3(C) distance [A] 

Fig. 5. Interaction within A:T, A:U, and G:C base pairs as a function of N1---N3 separation for the canonical Watson-Crick, 
edge-on orientation. Both GGA (solid circles) and vdW-DF (solid triangles) energies are shown. The labeled atoms are the 
non-hydrogen atoms engaging in hydrogen bonding. (Reprinted with permission from Ref. [Ill) : © 2008 American Institute 
of Physics). 




Fig. 6. Interaction energy as a function of twist for all possible DNA base pair steps as well as those for their uracil-containing 
counterparts. The insets give a top view of the molecular interaction for the AT: AT (top left), TA:TA (top center), and AA:TT 
(top right) steps. All calculations were carried out using the vdW-DF. (Reprinted with permission from Ref. [112) : © 2008 
American Chemical Society). 



base pairs. The 3-body contribution to the energy 
is relatively constant across the entire dataset while 
the 2-body term varies considerably, especially for 
the weaker-binding systems. 
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Fig. 8. Contributions from 2-body (red boxes) and 3-body 
(blue circles) dispersion interactions over all stacked base 
pairs investigated by von Lilienfeld and Tkatchenko (data 
taken from Ref. 136 [I. The green line shows the total binding 
energy as calculated by CCSD(T). 

3.4. DNA intercalation 

The TT-vr stacking interactions of DNA bases dis- 
cussed in the previous section are important for an- 
other reason. Many cancer-causing agents act by 
intercalating between base-pairs within a strand of 
DNA, preventing it from carrying out its normal 
functions. Ironically, some anti-cancer drugs can 
also act in this way. In the latter case the DNA 
is intentionally disturbed either to prevent its repli- 
cation or to trigger cell death. 

One well known intercalating anticancer drug 
is the poly-aromatic ellipticine molecule. This 
molecule can intercalate between base pairs of DNA 
where it is believed to interfere with the process of 
replication, effectively killing the cell.H^ Li et al. 
calculated the binding energy between the neutral 
ellipticine molecule and a single C:G base pair to 
be -18.4 kcal/mol.lSlNot surprisingly, the strength 
of the binding was shown to have a substantial de- 
pendence on the relative angle between the ellip- 
ticine and DNA bases, showing a relatively strong 
(several kcal/mol) preference for near parallel and 
anti-parallel conformations. Chun Lin et al. inves- 
tigated the intercalation of ellipticine between a 
cytosine-guanine base step (i.e. a pair of C:G base 
pairs) As shown in Fig. [9l they found that ellip- 
ticine is significantly attracted to the DNA complex 
even when it is several angstroms away and ulti- 
mately intercalates with a binding energy of about 
37 kcal/mol, in perfect accord with the earlier re- 
sults found by Li et al. Further, they found that the 
interaction was repulsive when van der Waals in- 
teractions were excluded from the calculations, von 
Lilienfeld and Tkatchenko found that the pairwise 



dispersion energy for this system to be a substan- 
tial -57 kcal/mol and the 3-body correction term 
was 8.9 kcal/mol.'^Si This is certainly a significant 
binding event and shows the strength with which 
aromatic molecules can interact with the loose vr 
electrons within DNA. Such interactions are com- 
mon for vr-stacked molecules such as benzene and 
the large number of relatively close neighbors within 
the vr-conjugated DNA bases is believed to be re- 
sponsible for the large interaction energy. 




AX (A) 

Fig. 9. Interaction energy of an ellipticine molecule as it in- 
tercalates into a CG:CG base step. Here Ai represents the 
displacement of the ellipticine from the final intercalated po- 
sition. The insets correspond to the fully intercalated struc- 
ture at Aa:: = A (top left) and Aa; = 5 A(bottom right). 
(Reprinted with permission from Ref. [7D] ; © 2007 American 
Chemical Society). 



The intercalation of both positively charged 
and neutral proflavine has also been studied.^n^ 
Neutral proflavine was found to bind to a C:G 
base pair with an energy of about -20.3 kcal/mol 
and charged proflavine with an energy near 12.1 
kcal/mol. The difference between the charged and 
uncharged binding was attributed to electrostatic 
effects rather than those of correlation. This con- 
clusion was reached largely because the results of 
standard PBE calculations, although they get the 
interaction energy of each system wrong, exhibit 
a similar difference between the two binding ener- 
gies. It is interesting to note that the binding en- 
ergy is larger for the positively charged proflavine 
even though the negatively charged backbone was 
omitted from these calculations. Again, a substan- 
tial preference for near parallel and anti-parallel rel- 
ative angles was found. 

The energetics of the interaction between 
proflavine and a T:A base pair was also studied, 
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with results qualitatively similar to those of the 
proflavine-C:G systemf^^ Proflavine was found to 
bind to T:A with a binding energy of -18 kcal/mol, 
again showing preference for near parallel and 
anti-parallel configurations. Steric clashes with the 
methyl group on the thymine base produced some 
interesting features in the rotation curve but did 
not change the overall preferred structures. 

Perhaps the most interesting finding of Li et al. 
was that both intercalators studied were found to 
have stronger interactions with a C:G base pair than 
a C:G base pair has with another C:G base pair, and 
that the angular depe ndence of these interactions 
qualitatively differ .E^ A C:G base pair dimer has a 
double- well minimum centered around 0°, with the 
minimum-energy configurations at a twist of about 
35° and -35°. The intercalators, by contrast, exhibit 
only one of these minima. This may partially ex- 
plain the disruption of secondary structure observed 
upon intercalation of these molecules, which may 
play an important role in their anti-cancer func- 
tion.EH 

3.5. Proteins 

Owing to their large size and complexity, simulation 
of proteins often proves to be a formidable challenge 
even for simple parameterized models. The applica- 
tion of quantum mechanics to a full protein is, un- 
fortunately, still beyond the reach of modern DFT. 
Recently, however, significant steps toward a quan- 
tum understanding of proteins have been made. 

Helical chains of alanine molecules are often 
studied because they are relatively simple yet they 
exhibit the canonical helix structure present in so 
many proteins. In addition, when capped with a 
charged species they can be formed experimentally 
so computed properties may be compared with ex- 
periment .^^^ In one study, Tkatchenko et al. looked 
at three heli cal forms {a, vr, and 3io) of poly- 
alanine chains.'^^ By comparing with PBE, a stan- 
dard gradient-corrected functional, they found sig- 
nificant van der Waals stabilization of all three he- 
lix types relative to the fully extended structure. In 
fact, PBE predicts nearly equal stabilization ener- 
gies for all three whereas the van der Waals calcu- 
lations showed a splitting of about 2 kcal/mol be- 
tween the a-helix and 3io structures. The authors 
note that the van der Waals effects are of much 
shorter range than the standard hydrogen bond sta- 
bilizations in the helical forms, since the helices are 
long-ranged structures exhibiting periodic hydrogen 



bonds. Despite this, the study found that van der 
Waals interactions were critical to explain the ob- 
served stability of poly-alanine helices up to about 
700 K. Through ab initio molecular dynamics cal- 
culations Tkatchenko et al. found that, when van 
der Waals effects were excluded, the helical struc- 
ture gave way to the fully extended form at a tem- 
perature well below that observed experimentally, 
even though hydrogen bonds were still correctly ac- 
counted for. Agreement with experiment was recov- 
ered when van der Waals interactions were included 
in the calculations, which showed a breaking up of 
the helical structure between 700 and 800 K. 

Drug discovery is a multi-billion dollar business 
and much effort is being put into so called ratio- 
nal drug design where potential drug molecules are 
scored based on their predicted binding affinity to 
a particular protein target .Elli^oillMJ xhis transfers 
the trial- and- error phase away from the lab, where 
experiments to test drug binding affinities can be 
relatively expensive and time-consuming, to the 
computer, where thousands of potential dr ugs can 
be tested for binding at relatively low cost.'^^'^^ 
Working toward this end, Antony et al. studied the 
interactions of a number of protein active sites with 
their respective biological ligands.'^^ They found 
that exclusion of van der Waals interactions can 
substantially change the ordering of ligand bind- 
ing affinities. Further, they found that neglect of 
these interactions can actually lead to the computed 
binding energies for a ligand with its target receptor 
being of the wrong sign. 

Another study, carried out by Rutledge and 
Wetmore focused on ligands that interact with their 
host protein via tt-tt stacking interactions and T- 
shaped vr interactions.'^ As before, they found that 
inclusion of van der Waals effects is imperative to 
obtain accurate energetics in such systems. 

4. Future Directions 

With the utility of these methods established, at- 
tention can be turned toward the future and what 
can be accomplished with them. Computation of a 
full macromolecule in atomistic detail is still beyond 
the reach of DFT, even for the most advanced com- 
puters, but the method can still be used tool 
to aid in our understanding of such systems. 

One useful approach that has been adopted by 
some groups is to use DFT to parameterize new 
force fields. Typically, these are parameterized ei- 
ther to reproduce experimental results or the re- 
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suits of high-level quantum chemistry calculations. 
As discussed in Section [2l quantum chemistry meth- 
ods are limited to fairly small systems. Parameteriz- 
ing force fields using the much larger systems that 
DFT is capable of simulating might help average 
out size effects and better represent the environ- 
ment that exists within macromolecules. Addition- 
ally, solid-state parameter sets could be developed 
to deal with molecular crystals. 

Another useful application of DFT is in the re- 
finement of experimental structures. Typical x-ray 
and NMR techniques provide data that is consis- 
tent with more than one structure. Also problem- 
atic is the placement of the x-ray invisible hydrogen 
atoms. Given this, experimentalists often use semi- 
empirical calculations to refine the observed struc- 
ture. Use of high-level DFT calculations including 
van der Waals interactions could yield a better re- 
sult, since large systems can be calculated very ac- 
curately. 

Drug discovery is another area where useful 
progress is being made by incorporating DFT calcu- 
lations and it is expected tha t DF T will play an im- 
portant role in this area soon.^^^ Although an entire 
protein may not be able to be treated quantum me- 
chanically, hybrid methods that apply varying levels 
of theory to regions within a protein are being used 
with much success. One can treat the drug molecule 
and its binding site with full quantum-mechanical 
rigor while treating distant regions using well-tested 
classical or semi-empirical approaches. This allows 
the most important physics to be treated accurately 
and coupled to a sufficient treatment of the less 
important parts of the problem. This is not only 
a useful approach for drug design but also applies 
to understanding the normal operation of ligand- 
binding proteins. Such methods are general referred 
to as QM/MM, i.e. quantum mechanics/molecular 
mechanics. 

Finally, although the applicability of DFT to 
calculations on full macromolecules is currently lim- 
ited, linear scaling DFT methods are becoming pop- 
ular and provide a tantalizing way forward. These 
approaches, which make use of special algorithms 
and highly- localized basis functions, can easily treat 
thousands of atoms — see Fig. [2j Such capabilities 
make computation of full macromolecular systems 
feasible. For example, the fledgling linear-scaling 
code ONETEP has been used to calculate proper- 
ties of a 20 base-pair strand of DNA containing al- 
most 1300 atoms. If augmented with the ability to 



adequately treat dispersion interactions, such lin- 
ear scaling DFT approaches may provide a practical 
means to apply full quantum mechanics to biologi- 
cal problems of real interest in the near future. 
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